BACKGROUND OF THE TNVENTTDN , 



The present invention relates to a virus ca- 
pable of inducing lymphadenopathies (hereinafter "LAS") 
and acquired immuno-depressive syndromes (hereinafter 
"AIDS"), to antigens of this virus, particularly in a 
purified form, and to a process for producing these an- 
tigens, particularly antigens of the envelope of this 
virus. The invention also relates to polypeptides, 
whether glycosylated or not, produced by the virus and 
to DNA sequences which code for such polypeptides. The 
invention further relates to cloned DNA sequences hybri- 
dizable to genomic RNA and DNA of the lymphadenopathy 
associated virus (hereinafter "LAV") of this invention 
and to processes for their preparation and their use. 
The invention still further relates to a stable probe 
including a DNA sequence which can be used for the 
detection of the LAV virus of this invention or related 
viruses or DNA proviruses in any medium, particularly 
biological, and in samples containing any of them. 

An important genetic polymorphism has been re- 
cognized for the human retrovirus which is the cause of 
AIDS and other diseases like LAS, AIDS-related complex 
(hereinafter "ARC") and probably some encephalopathies 
(for review, see Weiss, 1984). Indeed all of the isola- 
tes, analyzed until now, have had distinct restriction 
maps, even those recovered at the same place and time 
[Benn et al . , 1985]. Identical restriction maps have 
only been observed for the first two isolates which were 
designated LAV [Alizon et al ., 1 984] and human T-cell 
lymphotropic virus type 3 (hereinafter "HTLV-3") [Hahn 
et al., 1984] and which appear to be exceptions. The 
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genetic polymorphism of the AIDS virus was better asses- 
sed after the determination of the complete nucleotide 
sequence of LAV [Wain-Hobson et al., 1985], HTLV-3 
[Ratner et al., 1985 ; Muesing et al . , 1985] and a third 
isolate designated AIDS-associated retrovirus (herein- 
after " ARV2 " ) [Sanchez-Pescador et al., 1985]. m 
particular, it appeared that, besides the nucleic acid 
variations responsible for the restriction map polymor- 
phism, isolates could differ significantly at the pro- 
tein level, especially in the envelope (up to 13 % of 
difference between ARV and LAV), by both amino acids 
substitutions and reciprocal insertions-deletions 
[Rabson and Martin, 1985], 

Nevertheless, such differences did not go so 
far as to destroy the immunological similarity of such 
isolates as evidenced by the capabilities of their 
similar proteins, (e.g., core proteins of similar 
nature, such as the P 25 proteins, or similar envelope 
glycoproteins, such as the 110-120 kD glycoproteins) to 
immunologically cross-react. Accordingly, the proteins 
of any of said LAV viruses can be used for the in vitro 
detection of antibodies induced in vivo and present in 
biological fluids obtained from individuals infected 
with the other LAV variants. Therefore, these viruses 
are grouped together as a class of LAV viruses (herein- 
after "LAV-1 viruses"). 



SUMMARY OF THE TNVFNTTfifl 

In accordance with this invention, a new virus 
has been discovered that is responsible for diseases 
clinically related to AIDS and that can be classified as 
a LAV-1 virus but that differs genetically from known 
LAV-1 viruses to a much larger extent than the known 
LAV-1 viruses differ from each other. The new virus 



is basically characterized by the cDNA sequence which is 
shown in Figures 7A to 71, and this new virus is 
hereinafter generally referred to as "LAV 

MA L ' 

Also in accordance with this invention, 
variants of the new virus are provided. The RNAs of 
these variants and the related cDNAs derived from said 
RNAs are hybridizable to corresponding parts of the cDNA 
of LAV MAL' The DNA of the new virus also is provided, as 
well as DNA fragments derived therefrom hybridizable 
with the genomic RNA of LAV MAL , such DNA and DNA 
fragments particularly consisting of the cDNA or cDNA 
fragments of LAV MAL or of recombinant DNAs containing 
such cDNA or cDNA fragments. 

DNA recombinants containing the DNA or DNA 
fragments of LAV^ or its variants are also provided. 
It is of course understood that fragments which would 
include some deletions or mutations which would not 
substantially alter their capability of also hybridizing 
with the retroviral genome of LAV MAL are to be conside- 
red as forming obvious equivalents of the DNA or DNA 
fragments referred to hereinabove. 

Cloned probes are further provided which can 
be made starting from any DNA fragment according to the 
invention, as are recombinant DNAs containing such 
fragments, particularly any plasmids amplifiable in 
procaryotic or eucaryotic cells and carrying said 
fragments. Using cloned DNA containing a DNA fragment of 
LAV MAL as a molecul ar hybridization probe - either by 
marking with radionucleotides or with fluorescent 
reagents - LAV virion RNA may be detected directly, for 
example, in blood, body fluids and blood products (e.g., 
in antihemophilic factors such as Factor VIII concentra- 
tes). A suitable method for achieving such detection 
comprises immobilizing LAV MAL on a support (e.g., a ni- 
trocellulose filter), disrupting the virion and 
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hybridizing with a labelled (radiolabeled or "cold" 
fluorescent- or enzyme- labelled ) probe. Such an approach 
has already been developed for Hepatitis B virus in pe- 
ripheral blood (according to Scotto J. et al . Hepatology 
(1983), 3_, 379-384). 

Probes according to the invention can also be 
used for rapid screening of genomic DNA derived from the 
tissue of patients with LAV related symptoms to see if 
the proviral DNA or RNA present in their tissues is 
related to LAV^ . a method which can be used for such 
screening 1 comprises the following steps : extraction of 
DNA from tissue, restriction enzyme cleavage of said 
DNA, electrophoresis of the fragments and Southern 
blotting of genomic DNA from tissues and subsequent 
hybridization with labelled cloned LAV provival DNA. 
Hybridization in ?jtu can also be used. Lymphatic fluids 
and tissues and other non- lymphatic tissues of humans, 
primates and other mammalian species can also be 
screened to . see if other evolutionary related retrovi- 
ruses exist. The methods referred to hereinabove can be 
used, although hybridization and washings would be done 
under non-stringent conditions. 

The DNA according to the invention can be used 
also for achieving the expression of LAV viral antigens 
for diagnostic purposes, as well as for the production 
of a vaccine against LAV. Fragments of particular 
advantage in that respect will be discussed later. The 
methods which can be used are multifold : 

a) DNA can be transfected into mammalian cells 
with appropriate selection markers by a variety of tech- 
niques, such as calcium phosphate precipitation, 
polyethylene glycol, protoplast-fusion, etc. 

b) DNA fragments corresponding to genes can be 
cloned into expression vectors for £. cjali , yeast or 
mammalian cells and the resultant proteins purified. 



c) The provival DNA can be "shot-gunned" 
(fragmented) into procaryotic expression vectors to 
generate fusion polypeptides. 

Recombinants, producing antigenically competent fusion 
proteins, can be identified by simply screening the 
recombinants with antibodies against LAV MAL antigens. 
Particular reference in this respect is made to those 
portions of the genome of LAV MAL which, in the figures, 
are shown to belong to open reading frames and which 
encode the products having the polypeptidic backbones 
shown . 

Different polypeptides which appear in figures 

7A to 71 are still further provided. Methods disclosed 

in European application 0 178 978 and in PCT application 

PCT/EP 85/00548, filed Oct. 18, 1985, are applicable for 

the production of such peptides from LAV_, . in this 

M AL 

regard, polypeptides are provided containing sequences 
in common with polypeptides comprising antigenic deter- 
minants included in the proteins encoded and expressed 
by the LAV MAL genome. Means are also provided for the 
detection of proteins of LAV^^, particularly for the 
diagnosis of AIDS or pre-AIDS or, to the contrary, for 

the detection of antibodies against LAV„ or its 

M AL 

proteins, particularly in patients afflicted with AIDS 
or pre-AIDS or more generally in asymtomatic carriers 
and in blood-related products. Further provided are 
immunogenic polypeptides and more particularly 
protective polypeptides for use in the preparation of 
vaccine compositions against AIDS or related syndroms. 

Yet further provided are polypeptide fragments 
having lower molecular weights and having peptide 
sequences or fragments in common with those shown in 
figures 7A to 71. Fragments of smaller sizes can be 
obtained by resorting to known techniques, for instance, 
by cleaving the original larger polypeptide by enzymes 
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capable of cleaving it at specific sites. By way of 
examples may be mentioned the enzyme of Staphvl Q mrr Y ,, e 
aureus V8, a-chymotrypsine, "mouse sub-maxillary gland 
protease" marketed by the Boehringer company, vibrin 
alqi,nolyticus chemovar igphagus, collagenase, which 
specifically recognizes the peptides Gly-Pro, Cly-Ala, 
etc . 

Other features of this invention will appear 
in the following disclosure of data obtained starting 
from LAV MAL , in relation to the drawings. 



BRIEF DESCRIPTION OF THE. DRAWTMr.c; 

- Figs. 1A and 1B provide comparative restriction maps 
of the genomas of LAV MAL as compared to LAV £LI (Appli- 
cants' related new LAV virus which is the subject of 
their copending application, filed herewith) and LAV 

(a known LAV isolate deposited at the Collection 
Nationale des Cultures de Micro-organismes (hereinafter 
"CNCM") of the Pasteur Institute, Paris, France under 
No. 1-232 on July 15, 1983) ; 

- Fig. 2 shows comparative maps setting forth the 
relative positions of the open reading frames of the 
above genomas ; 

- Figs. 3A-3F (also designated generally hereinafter 
"fig. 3") indicate the relative correspondance between 
the proteins (or glycoproteins) encoded by the open 
reading frames, whereby amino acid residues of protein 
sequences of LkV^^ are in vertical alignment with 
corresponding amino acid residues (numbered) of 
corresponding or homologous proteins or glycoproteins of 
LAV BRU' as wel1 as LAV ELI and AR V 2 - 

- Figs. 4A-4B (also designated generally hereinafter 

"fig. 4") provide tables quantitating the sequence 

divergence between homologous proteins of LAV lav 

BRU' EL I 



and LAV MAL > 

- Fig. 5 shows diagrammatically the degree of divergence 
of the different virus envelope proteins ; 

- Figs. 6A and 6B ("Fig. 6" when consulted together) 
render apparent the direct repeats which appear in the 
proteins of the different AIDS virus isolates. 

- Figs. 7A-7I show the full nucleotidic sequences of 
LAV MAL • 

DETAILED DESCRIPT ION OF THF INVENTION 

CHARACTER! ZATIOfll AND MOLECULAR CLONING OF AH 
AFRICAN ISOLATE. 

The different AIDS virus isolates concerned 
are designated by three letters of the patients name, 
LAV BRU refe "ing to the prototype AIDS virus isolated in 
1983 from a French homosexual patient with LAS and 
thought to have been infected in the USA in the prece- 
ding years [Barre-Sinoussi et al. , 1983]. LAV was 

, ~ MAL 

recovered m 1985 from a 7-year old boy from Zaire. 

Related LAV £IJ was recovered in 1983 from a 24-year old 
woman with AIDS from Zaire. Recovery and purification of 
the LAV MAL virus were performed according to the method 
disclosed in European Patent Application 84 401834/138 
667 filed on September 9, 1984. 

LAV MAL is indisti nguishable from the previous- 
ly characterized isolates by its structural and biologi- 
cal properties in yiilfi. Virus metabolic labelling and 
immune precipitation by patient MAL sera, as well as 

reference sera, showed that the proteins of LAV had 
, . , MAL 

the same molecular weight (hereinafter "MW") as, and 

cross-reacted immunologically with those of, prototype 
AIDS virus (data not shown) of the LAV-1 class. 

Reference is again made to European Applica- 
tion 178 978 and International Application PCT/EP 
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85/00548 as concerns the purification, mapping and 
sequencing procedures used herein. See also the 
discussion under the headings "Experimental Procedures" 
and "Significance of the Figures" hereinafter. 

Primary restriction enzyme analysis of LAV 
genome was done by southern blot with total DMA derived 
from acutely infected lymphocytes, using cloned LAV 
complete genome as probe. Overall cross-hybridization 
was observed under stringent conditions, but the res- 
triction profile of the Zairian isolate was clearly 
different. Phage lambda clones carrying the complete 
viral genetic information were obtained and further 
characterized by restriction mapping and nucleotide 
sequence analysis. A clone (hereinafter "M-H11") was 
obtained by complete HindHI restriction of DNA from 
LAV MAL" infected cells, taking advantage of the existence 
of a unique HindHI site in the long terminal repeat 
(hereinafter "LTR"). M-H11 is thus probably derived from 
unintegrated viral DNA since that species was at least 
ten times more abundant than integraded provirus. 

Figure 1B gives a comparaison of the restric- 
tion maps derived from the nucleotide sequences of 
LAV ELI'. LAV MAL and Prototype LAV BRU , as well as from 
three other Zairian isolates (hereinafter "Z1\ ''Z2 H , 
and "Z3" respectively) previously mapped for seven 
restriction enzymes [Benn et al., 1985]. Despite this 
United number, all of the profiles are clearly 
different (out of the 23 sites making up the map of 
LAV BRU' only seven are Present in all six maps presen- 
ted), confirming the genetic polymorphism of the AIDS 
virus. No obvious relationship is apparent between the 
five Zairian maps, and all of their common sites are 

also found in LAV 

BRU ' 

Conservation of the genetic organization. 

The genetic organization of LAV MAL as deduced 
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from the complete nucleotide sequences of its cloned 
genome is identical to that found in other isolates, 
i.e., 5'gag-pol-central region-env-F3 ' . Most noticeable 
is the conservation of the "central region" (fig. 2), 
located between the pol and env genes, which is composed 
of a series of overlapping open reading frames 
(hereinafter "orf") previously designated Q, R, s, T, 
and u in the ovine lentivirus visna [Sonigo et al . , 
1985]. The product of orf S (also designated "tat") is 
implicated in the transactivation of virus expression 
[Sodroski et al., 1985 ; Arya et al., 1985] ; the 
biological role of the product of orf Q (also designated 
"sor" or "orf A" ) is still unknown [Lee et al . , 1986 ; 
Kang et al . , 1986]. Of the three other orfs, R, t, and 
U, only orf R is likely to be a seventh viral gene, for 
the following reasons : the exact conservation of its 
relative position with respect to Q and S (fig. 2), the 
ponstant presence of a possible splice acceptor and of a 
consensus AUG initiator codon, its similar codon usage 
with respect to viral genes, and finally the fact that 
the variation of its protein sequence within the dif- 
ferent isolates is comparable to that of gag, pol and Q 
(see fig. 4). 

Also conserved are the sizes of the U3, R and 
U5 elements of the LTR (data not shown), the location 
and sequence of their regulatory elements such as TATA 
box and AATAAA polyadenylation signal, and their 
flanking sequences, i.e., primer binding site 

(hereinafter "PBS") complementary to 3' end of tRNA LYS 
and polypurine tract (hereinafter "PPT"). Most of the 
genetic variability within the LTR is located in the 5' 
half of U3 (which encodes a part of orf F) while the 3' 
end of U3 and R, which carry most of the cis-acting 
regulatory elements, promoter, enhancer and 

trans-activating factor receptor [Rosen et al . , 1985], 
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as well as the. U5 element, are well-conserved. 

Overall, it clearly appears that this Zairian 
isolate, LAV^, is the same type of retrovirus as the 
previously sequenced isolates of American or European 
origin . 

Variability of the viral proteins. 

Despite their identical genetic organization, 
the LAV ELI and LAV mal shows substantial differences in 
the primary structure of their proteins. The amino acid 
sequences of LAV ELI and LAV MAL proteins are presented in 
figures 3A-3F, aligned with those of LAV and ARV 2. 
Their divergence was quantified as the percentage of 
amino acids substitutions in two-by-two alignments (Fig. 
4). The number of insertions and deletions that had to 
be introduced in each of these alignments has also been 
scored. 

Three general observations can be made. First, 
the protein sequences of the LAV^ and LAV^ t are more 
divergent from LAV^ than are those of HTLV-3 and ARV 2 
(Fig. 4A) ; similar results are obtained if ARV 2 is 
taken as reference (not shown). The range of genetic 
polymorphism between isolates of the AIDS virus is 
considerably greater than previously observed. Second, 
our two sequences confirm that the envelope is more 
variable than the gag and pol genes. Here again, the 
relatively small difference observed between the env of 
LAV BRU and HTLV "3 appears as an exception. Third, the 
mutual divergence of the LAV^ and LAV^ (Fig. 4B) is 
comparable to that between LAV^ and either of them; as 
far as we can extrapolate from only three sequenced 
isolates from the USA and Europe and two (LAV_ and 



LAV 



ELI 



MAL 



) from Africa, this is indicative of a wider 
evolution of the AIDS virus in Africa. 

qaq and — : The ir greater degree of conservation 
compared to the envelope is consistent with their 
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encoding important structural or enzymatic activities. 
Of the three mature gag proteins, the p25 which was the 
first recognized immunogenic protein of LAV [Barre- 
Sinoussi et al., 1983] is also the better conserved 
(fig. 3). In gag and pol, differences between isolates 
are principally due to point mutations, and only a small 
number of insertional or deletional events is observed. 
Among these, we must note the presence in the over- 
lapping part of gag and pol of L A ^ of an insertion of 
12 amino acids (AA) which is encoded by the second copy 
of a 36 bp direct repeat present only in this isolate 
and in HTLV-3. This duplication was omitted because of a 
computing error in the published sequence of LAV^ 
(position 1712, Wain-Hobson et al . , 1985) but was indeed 
present in the HTLV-3 sequences [Ratner et al., 1985 ; 
Muesing et al . , 1985], 

env : Three segments can be distinguished in the enve- 
lope glycoprotein precursor [Allan et al., 1985 ; 
Montagnier et al . , 1985 ; DiMarzoVeronese et al . , 1985]. 
The first is the signal peptide (positions 1-33 in fig. 
3), and its sequence appears as variable ; the second 
segment (pos. 34-530) forms the outer membrane protein 
(hereinafter M 0MP" or "gp110") and carries most of the 
genetic variations, and in particular almost all of the 
numerous reciprocal insertions and deletions ; the third 
segment (531-877) is separated from the OMP by a poten- 
tial cleavage site following a constant basic stretch 
(Arg-Glu-Lys-Arg) and forms the transmembrane protein 
(hereinafter M TMP " or n gp 41") responsible for the an- 
chorage of the envelope glycoprotein in the cellular 
membrane. A better conservation of the TMP than the OMP 
has also been observed between the different murine 
leukemia viruses (hereinafter H MLV") [Koch et al . , 1983] 
and could be due to structural constraints. 

From the alignment of figure 3 and the 
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graphical representation of the envelope variability 
shown in figure 5, we clearly see the existence of 
conserved domains, with little or no genetic variation, 
and hypervariable domains, in which even the alignment 
of the different sequences is very difficult, because of 
the existence of a large number of mutations and of 
reciprocal insertions and deletions. We have not 
included the sequence of the envelope of the HTLV-3 
isolate since it so close to that of LAV-.,, (cf. fig. 
4), even in the hypervariable domains, that it did not 
add anything to the analysis. While this graphical 
representation will be refined by more sequence data, 
the general profile is already apparent, with three 
hypervariable domains (Hyl, 2 and 3) all being located 
in the OMP and separated by three well-conserved 
stretches (residues 37-130, 211-289, and 488-530 of fig. 
3 alignment) probably associated with important biolo- 
gical functions. 

In spite of the extreme genetic variability, 
the folding pattern of the envelope glycoprotein is 
probably constant. Indeed the position of virtually all 
of the cysteine residues is conserved within the diffe- 
rent isolates (fig. 3 and 5), and the only three varia- 
ble cysteines fall either in the signal peptide or in 
the very C-terminal part of the TMP. The hypervariable 
domains of the OMP are bounded by conserved cysteines, 
suggesting that they may represent loops attached to the 
common folding pattern. Also the calculated hydropathic 
profiles [Kyte and Doolittle, 1982] of the different en- 
velope proteins are remarkably conserved (not shown). 

About half of the potential N-glycosylation 

sites, Asn-X-Ser/Thr, found in the envelopes of the 

Zairian isolates map to the same positions in LAV 

BRU 

(17/26 for LAV ai and 17/28 for LAV MKL ) . The other sites 
appear to fall. within variable domains of env, 
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suggesting the existence of differences in the extent of 
envelope glycosylation between different isolates. 
Other viral pm^nff : Of the three other identified 
viral proteins, the P 27 encoded by orf F, 3' of env 
[Allan et al., 1985b] is the most variable (fig. 4) . T he 
proteins encoded by orfs Q and S of the central region 
are remarkable by their absence of insertions/deletions. 
Surprisingly, a high frequency of amino acids substitu- 
tions, comparable to that observed in env, is found for 
the product of orf S (trans-activating factor). On the 
other hand, the protein encoded by orf Q is no more 
variable than gag. Also noticeable is the lower 
variation of the proteins encoded by the central regions 
of LA Veli and LA Vmal . 

With the availability of the complete nucleo- 
tide sequence from five independant isolates, some 
general features of the AIDS virus 1 genetic variability 
are now emerging. Firstly, its principal cause is point 
mutations which very often result in amino acid substi- 
tutions and which are more frequent in the 3* part of 
the genome (orf S, env and orf F) . Like all RNA viruses, 
the retroviruses are thought to be highly subject to 
mutations caused by errors of the RNA polymerases during 
their replication, since there is no proofreading, of 
this step [Holland et al., 1982 ; Steinhauer and 
Holland, 1986]. 

Another source of genetic diversity is 
insertions/deletions. From the figure d alignments, in- 
sertional events seem to be implicated in most of the 
cases, since otherwise deletions should have occurred in 
independant isolates at precisely the same locations. 
Furthermore, upon analyzing these insertions, we have 
observed that they most often represent one of the two 
copies of a direct repeat (fig. 6). Some are perfectly 
conserved like the 36 bp repeat in the gag-pol overlap 
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of LAV^y (fig. 6-a) ; others carry point mutations 
resulting in amino acid substitutions, and as a conse- 
quence, they are more difficult to observe, though 
clearly present, in the hypervariable domains of env 
(cf. fig. 6-g and -h) . As noted for point mutations, env 
gene and orf F also appear as more susceptible to that 
form of genetic variation than the rest of the genome. 
The degree of conservation of these repeats must be 
related to their date of occurrence in the analyzed 
sequences : the more degenerated, the more ancient. A 
very recent divergence of LAV BRU and HTLV-3 is suggested 
by the extremely low number of mismatched AA between 

their homologous proteins. However, one of the LAV 

BRU 

repeats (located in the Hyl domain of env, fig. 6-f) is 
not present in HTLV-3, indicating that this generation 
of tandem repeats is a rapid source of genetic diver- 
sity. We have found no traces of such a phenomenon, even 
when comparing very closely related viruses, such as the 
Mason-Pfizer monkey virus (hereinafter "MPMV") [Sonigo 
et al . , 1986], and an immunosuppressive simian virus 
(hereinafter "SRV-1") [Power et al . , 1986]. Insertion or 
deletion of one copy of a direct repeat have been occa- 
sionally reported in mutant retroviruses [Shimotohno and 
Temin, 1981 ; Darlix, 1986], but the extent to which we 
observe this phenomenon is unprecedented. The molecular 
basis of these duplications is unclear, but could be the 
"copy-choice" phenomenon, resulting from the diploidy of 
the retroviral genome [Varmus and Swanstrom, 1984 ; 
Clark and Mak, 1983]. During the synthesis of the first- 
strand of the viral DNA, jumps are known to occur from 
one RNA molecule to another, especially when a break or 
a stable secondary structure is present on the template; 
an inaccurate re-initiation on the other RNA template 
could result in the generation (or the elimination) of a 
short direct repeat. 
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Genetic variability and subsequent antigenic 
modifications have often been developed by micro- 
organisms as a means for avoiding the host's immune 
response, either by modifying their epitopes during the 
course of the infection, as in trypanosomes [Borst and 
Cross, 1982], or by generating a large repertoire of 
antigens, as observed in influenza virus [Webster et 
al., 1982]. As the human AIDS virus is related to animal 
lentiviruses [Sonigo et al., 1985 ; Chiu et al . , 1985], 
its genetic variability could be a source of antigenic 
variation, as can be observed during the course of the 
infection by the ovine lentivirus visna [Scott et al . , 
1979 ; Clements et al., 1980] or by the equine infec- 
tious anemia virus (hereinafter "EIAV") [Montelaro et 
al., 1984]. However, a major discrepancy with these 
animal models is the extremely low, and possibly 
nonexistant, neutralizing activity of the sera of 
individuals infected by the AIDS virus, whether they are 
healthy carriers, displaying minor symptoms, or 
afflicted with AIDS [Weiss et al., 1985 ; Clavel et al . , 
1985]. Furthermore, even for the visna virus the exact 
role of antigenic variation in the pathogenesis is 
unclear [Thormar et al . , 1983 ; Lutley et al . , 1983], We 
rather believe that genetic variation represents a 
general selective advantage for lentiviruses by allowing 
an adaptation to different environments, for example by 
modifying their tissue or host tropisms. In the particu- 
lar case of the AIDS virus, rapid genetic variations are 
tolerated, especially in the envelope. This could allow 
the virus to become adapted to different "micro-environ- 
ments" of the membrane of their principal target cells, 
namely the T4 lymphocytes. These "micro-environments" 
could result from the immediate vicinity of the virus 
receptor to polymorphic surface proteins, differing 
either between individuals or between clones of 
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lymphocytes . 

Conserved domains in the aids virn ff « n Yr ift rff 



Since the proteins of most of the isolates are 
antigenically cross-reactive, the genotypic differences 
do not seem to affect the sensitivity of actual diagnos- 
tic tests, based upon the detection of antibodies to the 
AIDS virus and using purified virions as antigens. They 
nevertheless have to be considered for the development 
of the "second-generation" tests, that are expected to 
be more specific, and will use smaller synthetic or 
genetically-engineered viral antigens. The identifi- 
cation of conserved domains in the highly immunogenic 
envelope glycoprotein and the core structural proteins 
(gag) is very important for these tests. The conserved 
stretch found at the end of the OMP and the beginning of 
the TMP (490-620, fig. 3) could be a good candidate, 
since a bacterial fusion protein containing this domain 
was well-detected by AIDS patients' sera [Chang et al., 
1985]. 

The envelope, specifically the OMP, mediates 
the interaction between a retrovirus and its specific 
cellular receptor [DeLarco and Todaro, 1976 ; Robinson 
et al., 1980]. In the case of the AIDS virus, in ylixa 
binding assays have shown the interaction of the enve- 
lope glycoprotein g P 110 with the T4 cellular surface 
antigen [McDougal et al., 1986], already thought to be 
closely associated with the virus receptor [Klatzmann et 
al., 1984 ; Dagleish et al. , 1984]. Identification of 
the AIDS. virus envelope domains that are responsible for 
this interaction (receptor-binding domains) appears to 
be fundamental for understanding of the host-viral 
interactions and for designing a protective vaccine, 
since an immune response against these epitopes could 
possibly elicit neutralizing antibodies. As the AIDS 
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virus receptor is at least partly formed of a constant 
structure, the T4 antigen, the binding site of the 
envelope is unlikely to be exclusively encoded by 
domains undergoing drastic genetic changes between 
isolates, even if these could be implicated in some kind 
of an "adaptation-. One or several of the conserved 
domains of the OMP (residues 37-130, 211-289, and 
488-530 of fig. 3 alignment), brought together by the 
folding of the protein, must play a part in the virus- 
receptor interaction, and this can be explored with 
synthetic or genetically-engineered peptides derived 
from these domains, either by direct binding assays or 
indirectly by assaying the neutralizing activity of 
specific antibodies raised against them. 

African AIDS virn SI » s 

Zaire and the neighbouring countries of 
Central Africa are considered as an area endemic with 
the AIDS virus infection, and the possibility that the 
virus has emerged in Africa has became a subject of 
intense controversy (see Norman, 1985). From the present 
study, it is clear that the genetic organization of 
Zairian isolates is the same as that of american 
isolates, thereby indicating a common origin. The very 
important sequence differences observed between the 
proteins are consistent with a divergent evolutionary 
process. In addition, the two African isolates are 
mutually more divergent than the American isolates 
already analyzed ; as far as that observation can be 
extrapolated, it suggests a longer evolution of the 
virus in Africa and is also consistent with the fact 
that a larger fraction of the population is exposed than 
in developed countries. 

A novel human retrovirus with morphology and 
biologocal properties (cytopathogenicity , T4 tropism) 
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similar to those of LAV, but nevertheless clearly 
genetically and antigenically distinct from it, was 
recently isolated from two patients with AIDS 
originating from Guinea Bissau, West-Africa [Clavel et 
al., 1986]. in neighboring Senegal, the population was 
seemingly exposed to a retrovirus also distinct from LAV 
but apparently non-pathogenic [Barin et al . , 1985 ; 
Kanki et al . , 1986]. Both of these novel African retro- 
viruses seem to be antigenically related to the simian 
T-cell lymphotropic virus (hereinafter "STLV-IH") shown 
to be widely present in healthy African green monkeys 
and other simian species [Kanki et al. 1985]. This 
raises the possibility of a large group of African 
primate lentiviruses , . ranging from the apparently 
non-pathogenic simian viruses to the LAV-type viruses. 
Their precise relationship will only be known after 
their complete genetic characterization, but it is 
already very likely that they have evolved from a common 
progenitor. The important genetic variability we have 
observed between isolates of the AIDS virus in Central 
Africa is probably a hallmark of this entire group and 
may account for the apparently important genetic 
divergence between its members (loss of 

cross-antigenicity in the envelopes). In this sense, the 
conservation of the tropism for the T4 lymphocytes 
suggests that it is a major advantage aquired by these 
retroviruses. 

EXPERIMENTAL PRnrPmTPF«j 

Virus isolation 

LAV MAL was isolate< * from the peripheral blood 
lymphocytes of the patient as described [Barre-Sinoussi 
et al., 1983]. Briefly, the lymphocytes were fractiona- 
ted and co-cultivated with phytohaemagglutinin-stimula- 
ted normal human lymphocytes in the presence of 
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interleukin 2 and anti-alpha interferon serum. Viral 
production was assessed by cell-free reverse 
transcriptase (hereinafter " RT " ) activity assay in the 
cultures and by electron microscopy. 
Molecular cloning 

Normal donor lymphocytes were acutely infected 
(10 cpm of RT activity/10 6 cells) as described [Barre- 
Sinoussi et al., 1983], and total DNA was extracted at 
the beginning of the RT activity peak. A lambda library 
using the L47-1 vector [Loenen and Brammar, 1982] was 
constructed by partial Hindlll digestion of the DNA as 
already described [Alizon et al . , 1984]. DNA from 
infected cells was digested to completion with Hindlll, 
and the 9-10kb fraction was selected on 0.8 \ low 
melting point agarose gel and ligated into L47-1 Hindlll 
arms. About 2. 10 5 plaques f or LAV MAL , obtained by in 
vitrjj packaging (Amersham) , were plated on ^ colj LA101 
and screened in situ, under stringent conditions , using 
the 9 kb SacI insert of the clone lambda J19 [Alizon et 
al., 1984] carrying most of the LAV BRU genome as probe. 
Clones displaying positive signals were plaque-purified 
and propagated on SOIL C600 recBC, and the recombi- 
nant phage M-H11 carrying the complete genetic 
information of LAV MAL was further characterized by 
restriction mapping. 

Mncleoti.de sequence strategy 

Viral fragments derived from M-H11 were 
sequenced by the dideoxy chain terminator procedure 
[Sanger et al . , 1977] after "shotgun" cloning in the 
M13mp8 vector [Messing and Viera, 1982] as previously 
described [Sonigo et al., 1985]. The viral genome of 
LAV MAL is 9229 nucleotides long as shown in figs. 7A-7I. 
Each nucleotide of LAV^ was determined from more than 
5 independent clones on average. 

SIGNIFICANCE OF THE FIGURES 
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1 contains an analysis of AIDS virus isolates, 
showing : 

A/ Restriction maps of the inserts of phage 
lambda clones derived from cells infected with LAV 
(hereinafter "E-H12") and with LAV„. T (M-H11) The 
schematic genetic organization of the AIDS virus has 
been drawn above the maps. The LTRs are indicated by 
solid boxes. Restriction sites are indicated as follows: 
A:Aval; B : BamHI ; BgrBglll; E:EcoRI; H:HindIII; 

He: Hindi; K:KpnI; N:NdeI; PrPstI; S:SacI; and XrXbal. 
Asterisks indicate the Hindlll cloning sites in lambda 
L47-1 vector. 

B/ A comparison of the sites for seven 
restriction enzymes in six isolates : the prototype AIDS 
virus LAV BRU , LAV MAL and LAV £IJ • and Z1, Z2 and Z3. 
Restriction sites are represented by the following 
symbols vertically aligned wih the symbols in fig. 1A: 

Bglll; * =EcoRI; V .Hindi; r:HindIII; ^ :KpnI; O :NdeI; 
and o : Sad . 

Figure 2 shows the genetic organization of the central 
region in AIDS virus isolates. Stop codons in each phase 
are represented as vertical bars. Vertical arrows indi- 
cate possible AUG initiation codons. Splice acceptor (A) 
and donor (D) sites identified in subgenomic viral mRNA 
[Muesing et al., 1985] are shown below the graphic of 
LAV BRU' and corresponding sites in LAV £LI and LAV MAL are 
indicated. PPT indicates the repeat of the polypurine 

tract flanking the 3'LTR. As observed in LAV 

BRU 

[Wain-Hobson et al., 1985], the PPT is repeated 256 
nucleotides 5' to the end of the pol gene in both the 

LAV ELI and LAV MAL se< J u ences, but this repeat is 

degenerated at two positions in LAV_ 

ELI - 

Figure 3 shows an alignment of the protein sequences of 
four AIDS virus isolates. Isolate LAV BRU [Wain-Hobson et 
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al., 1985] is taken as reference ; only differences with 
LAV BRU are noted £or A RV 2 [Sanchez-Pescador et al . , 

1985] and the two Zairian isolates LAV._ _ and LAV a 

MAL EL I' 

minimal number of gaps (-) were introduced in the 

alignments. The NH^-termini of P 25 gag and P 18 gag are 

indicated [Sanchez-Pescador, 1985]. The potential 

cleavage sites in the envelope precursor [Allan et al., 

1985a ; diMarzoVeronese, 1985] separating the signal 

peptide (hereinafter "SP"), OMP and TMP are indicated as 

vertical arrows ; conserved cysteines are indicated by 

black circles and variable cysteines are boxed. The one 

letter code for each amino acid is as follows: A: Ala ; 

C:Cys ; D:Asp ; E:Glu ; F : Phe ; G : Gly ; H :His ; I: He ; 

K:Lys ; L : Leu ; M : Met ; NrAsn ; P : Pro ; Q:Gln ; R:Arg ; 

S:Ser ; T:Thr ; V:Val; W:Trp ; Y:Tyr. 

Figure 4 shows a quantitation of the sequence divergence 
between homologous proteins of different isolates. Part 
A of each table gives results deduced from two-by-two 
alignments using the proteins of LAV_ as reference, 
part B, those of LAV ai as reference. Sources: Muesing 
et al., 1985 for HTLV-3 ; Sanchez-Pescador et al . , 1985 
for ARV 2 and Wain-Hobson et al . , 1985 for I»AV BRU . For 
each case in the tables, the size in amino acids of the 
protein (calculated from the first methionine residue or 
from the beginning of the orf for pol) is given at the 
upper left part. Below are given the number of deletions 
(left) and insertions (right) necessary for the align- 
ment. The large numbers in bold face represent the 
percentage of amino acids substitutions (insertions/de- 
letions being excluded). Two by two alignments were done 
with computer assistance [Wilburg and Lipman, 1983], 
using a gag penalty of 1, K-tuple of 1, and window of 
20, except for the hypervariable domains of env, where 
the number of gaps was made minimum, and which are 
essentially aligned as shown in fig. 3. The sequence of 
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the predicted protein encoded by orf R of HTLV-3 has not 
been compared because of a premature termination relati- 
ve to all other isolates. 

Figure 5 shows the variability of the AIDS virus 
envelope protein. For each position x of the alignment 
of £hy (Fig. 3), variability V(x) was calculated as: 
V(x) = number of different amino-acids at position x/ 
frequency of the most abundant amino acid at position x. 
Gaps in the alignments are considered as another amino 
acid: For an alignment of 4 proteins, V(x) ranges from 1 
(identical AA in the 4 sequences) to 16 (4 different 
AA). This type of representation has previously been 
used in a compilation of the AA sequence of immunoglo- 
bulins variable regions [Wu and Kabat, 1970]. Vertical 
arrows indicate the cleavage sites ; asterisks represent 
potential N-glysosylation sites (N-X-S/T) conserved in 
all three four solates ; black triangles represent 
conserved cysteine residues. Black lozanges mark the 
three major hydrophobic domains: OMP, TMP and SP; and 
the hypervariable domains: Hyl, 2 and 3. 
Figure 6 shows the direct repeats in the proteins of 
different AIDS virus isolates. These examples are 
derived from the aligned sequences of gag (a, b), F 
(c,d) and env (e, f, g, h) shown in figure 3. The two 
elements of the direct repeat are boxed, while degene- 
rated positions are underlined. 

Figures 7A-7I show the complete cDNA sequence of LAV^ 
of this invention. 

The invention thus pertains more specifically 
to the proteins, glycoproteins and other polypeptides 
including the polypeptidic structures shown in the 
figures 1-7. The first and last amino acid residues of 
these proteins, glycoproteins and polypeptides carry 
numbers computed from a first amino acid of the 
open-reading frames concerned, although these numbers do 



not correspond exactly to those of the LAV^^ proteins 
concerned, rather to the corresponding proteins of the 
LAV BRU or se< J uences shown in figs. 3A, 3B and 3C. Thus a 
number corresponding to a "first amino acid residue" of 
a LAV MAL P rotein corresponds to the number of the first 
amino-acyl residue of the corresponding LAVg^ protein 
which, in any of figs. 3A, 3B or 3C, is in direct 
alignment with the corresponding first amino acid of the 
LAV MAL P rotein - Thus the sequences concerned can be read 
from figs. 7A-7I to the extent where they do not appear 
with sufficient clarity from Figs. 3A-3F. 

The preferred protein sequences of this 
invention extend between the corresponding "first" and 
"last" amino acid residues. Also preferred are the 
protein(s)- or glycoprotein ( s ) -portions including part 
of the sequences which follow : 

OMP or gp110 proteins, including precursors : 
1 to 530 

OMP or gp110 without precursor : 
34-530 

Sequence carrying the TMP or gp41 protein : 
531-877, particularly 
680-700 

well conserved stretches of OMP : 

37-130, 
211-289 and 
488-530 

well conserved stretch found at the end of the OMP and 
the beginning of TMP : 

490-620. 

Proteins containing or consisting of the "well 
conserved stretches" are of particular interest for the 
production of immunogenic compositions and (preferably 
in relation to the stretches of the env protein) of 
vaccine compositions against the LAV-1 viruses. 



# 
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The invention concerns more particularly all 
the DNA fragments which have been more specifically 
referred to in the drawings and which correspond to open 
5 reading frames. It will be understood that one skilled 
in the art will be able to obtain them all, for instance 
by cleaving an entire DNA corresponding to the complete 
genome of LAV^, such as by cleavage by a partial or 
complete digestion thereof with a suitable restriction 

1Q enzyme and by the subsequent recovery of the relevant 
fragments. The DNA disclosed above can be resorted to 
also as a source of suitable fragments. The techniques 
disclosed in PCT application for the isolation of the 
fragments which can then be included in suitable 

15 plasmids are applicable here too. of course, other 
methods can be used, some of which have been exemplified 
in European Application No. 178,978, filed September 17, 
1985. Reference is for instance made to the following 
methods : 

20 a) DNA can be transfected into mammalian cells 

with appropriate selection markers by a variety of tech- 
niques, such as calcium phosphate precipitation, 
polyethylene glycol, protoplast-fusion, etc. 

b) DNA fragments corresponding to genes can be 
25 cloned into expression vectors for E. coli, yeast- or 

mammalian cells and the resultant proteins purified. 

c) The provival DNA can be "shot-gunned" 
(fragmented) into procaryotic expression vectors to 
generate fusion polypeptides. Recombinants, producing 

3Q antigenically competent fusion proteins, can be identi- 
fied by simply screening the recombinants with anti- 
bodies against LAV antigens. 

The invention further refers to DNA recombi- 
nants, particularly modified vectors, including any of 

35 the preceding DNA sequences adapted to transform corres- 
ponding microorganisms or cells, particularly eucaryotic 
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ells such as yeasts, for instance Saccharnmy ^ 
cerey i s j ae , or higher eucaryotic cells, particularly 
cells of mammals, and to permit expression of said DNA 
sequences in the corresponding microorganisms or cells. 
General methods of that type have been recalled in the 
abovesaid PCT international patent aplication PCT/EP 
85/00548, filed October 18, 1985. 

More particularly the invention relates to 
such modified DNA recombinant Vectors modified by the 
abovesaid DNA sequences and which are capable of trans- 
forming higher eucaryotic cells particularly mammalian 
cells. Preferably, any of the abovesaid sequences are 
Placed under the direct control of a promoter contained 
in said vectors and recognized by the polymerases of 
said cells, such that the first nucleotide codons 
expressed correspond to the first triplets of the 
above-defined DNA sequences. Accordingly, this invention 
also relates to the corresponding DNA fragments which 
can be obtained from the genome of LAV pftL or its cDNA by 
any appropriate method. For instance, such a method 
comprises cleaving said LAV MAL genome or its cDNA by 
restriction enzymes preferably at the level of restric- 
tion sites surrounding said fragments and close to the 
opposite extremities respectively thereof, recovering 
and identifying the fragments sought according to sizes, 
if need be checking their restriction maps or nucleotide 
sequences (or by reaction with monoclonal antibodies 
specifically directed against epitopes carried by the 
polypeptides encoded by said DNA fragments), and further 
if need be, trimming the extremities of the fragment, 
for instance by an exonucleolytic enzyme such as Bal31, 
for the purpose of controlling the desired nucleotid- 
sequences of the extremities of said DNA fragments or, 
conversely, repairing said extremities with Klenow 
enzyme and possibly ligating the latter to synthetic 
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polynucleotide fragments designed to permit the 
reconstitution of the nucleotide extremities of said 
fragments. Those fragments may then be inserted in any 
of said vectors for causing the expression of the 
corresponding polypeptide by the cell transformed there- 
with. The corresponding polypeptide can then be recove- 
red from the transformed cells, if need be after lysis 
thereof, and purified by methods such as electrophore- 
sis. Needless to say, all conventional methods for 
performing these operations can be resorted to. 

The invention also relates more specifically 
to cloned probes which can be made starting from any DNA 
fragment according to this invention, thus to recombi- 
nant DNAs containing such fragments, particularly any 
plasmids amplifiable in procaryotic or eucaryotic cells 
and carrying said fragments. Using the cloned DNA 
fragments as a molecular hybridization probe - either by 
labelling with radionucleotides or with fluorescent 
reagents - LAV virion RNA may be detected directly in 
the blood, body fluids and blood products (e.g. of the 
antihemophylic factors such as Factor VIII concentrates) 
and vaccines (e.g., hepatitis B vaccine). It has already 
been shown that whole virus can be detected in culture 
supernatants of LAV producing cells. A suitable method 
for achieving that detection comprises immobilizing 
virus on a support (e.g., a nitrocellulose filter), dis- 
rupting , the virion and hybridizing with labelled 
(radiolabelled or "cold" fluorescent- or enzyme-label- 
led) probes. Such an approach has already been developed 
for Hepatitis B virus in peripheral blood [SCOTTO J. et 
al. Hepatology (1983), 3, 379-384]. 

Probes according to the invention can also be 
used for rapid screening of genomic DNA derived from the 
tissue of patients with LAV related symptoms, to see if 
the proviral DNA or RNA present in host tissue and other 
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tissues can be related to that of LAV 

MAL' 

A method which can be used for such screening 
comprises the following steps : extraction of DNA from 
tissue, restriction enzyme cleavage of said DNA, 
electrophoresis of the fragments and Southern blotting 
of genomic DNA from tissues, subsequent hybridization 
with labelled cloned LAV proviral DNA. Hybridization In 
situ can also be used. 

Lymphatic fluids and tissues and other non- 
lymphatic tissues of humans, primates and other 
mammalian species can also be screened to see if other 
evolutionary - related retrovirus exist. The methods 
referred to hereinabove can be used, although hybridi- 
zation and washings would be done under non-stringent 
conditions . 

The DNAs or DNA fragments according to the 
invention can be used also for achieving the expression 
of viral antigens of LAV MAL for diagnostic purposes. 

The invention relates generally to the poly- 
peptides themselves, whether synthesized chemically, 
isolated from viral preparations or expressed by the 
different DNAs of the invention, particularly by the 
ORFs or fragments thereof in appropriate hosts, par- 
ticularly procaryotic or eucaryotic hosts, after trans- 
formation thereof with a suitable vector previously 
modified by the corresponding DNAs. 

More generally, the invention also relates to 
any of the polypeptide fragments (or molecules, particu- 
larly glycoproteins having the same polypeptidic back- 
bone as the polypeptides mentioned hereinabove) bearing 
an epitope characteristic of a protein or glycoprotein 
of LAV MAL' which Polypeptide or molecule then has 
N-terminal and C-terminal extremities respectively 
either free or, independently from each other, cova- 
lently bonded to amino acids other than those which are 
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normally associated with them in the larger polypeptides 
or glycoproteins of the LAV virus, which last mentioned 
amino acids are then free or belong to another polypep- 
tide sequence. Particularly, the invention relates to 
hybrid polypeptides containing any of the epitope- 
bearing-polypeptides which have been defined more speci- 
fically hereinabove, recombined with other polypeptides 
fragments normally foreign to the LAV proteins, having 
sizes sufficient to provide for an increased immunogeni- 
city of the epitope-bearing-polypeptide yet, said 
foreign polypeptide fragments either being immunogeni- 
cally inert or not interfering with the immunogenic 
properties of the epitope-bearing-polypeptide. 

Such hybrid polypeptides, which may contain 
from 5 up to 150, even 250 amino acids, usually consist ' 
of the expression products of a vector which contained 
a£ i nitio a nucleic acid sequence expressible under the 
control of a suitable promoter or replicon in a suitable 
host, which nucleic acid sequence had however beforehand 
been modified by insertion therein of a DNA sequence 
encoding said epitope-bearing-polypeptide. 

Said epitope-bearing-polypeptides, particular- 
ly those whose N-terminal and C-terminal amino acids are 
free, are also accessible by chemical synthesis accord- 
ing to technics well known in the chemistry of proteins. 

The synthesis of peptides in homogeneous 
solution and in solid phase is well known. In this 
respect, recourse may be had to the method of synthesis 
in homogeneous solution described by Houbenweyl in the 
work entitled "Methoden der Organischen Chemie" (Methods 
of Organic Chemistry) edited by E. WUNSCH. , vol. 15-1 
and II, THIEME, Stuttgart 1974. This method of synthesis 
consists of successively condensing either the 
successive amino acids in twos, in the appropriate order 
or successive peptide fragments previously available or 
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formed and containing already several amino-acyl 
residues in the appropriate order respectively. Except 
for the carboxyl and aminogroups which will be engaged 
in the formation of the peptide bonds, care must be 
taken to protect beforehand all other reactive groups 
borne by these amino-acyl groups or fragments. However, 
prior to the formation of the peptide bonds, the 
carboxyl groups are advantageously activated, according 
to methods well known in the synthesis of peptides. 
Alternatively, recourse may be had to coupling reactions 
bringing into play conventional coupling reagents, for 
instance of the carbodiimide type, such as 
1 -ethyl -3- ( 3-diraethyl-amino-propyl ) -carbodiimide . When 
the amino acid group used carries an additional amine 
group (e.g., lysine) or another acid function (e.g., 
glutamic acid), these groups may be protected by 
carbobenzoxy or t-butyloxycarbonyl groups, as regards 
the amine groups, or by t-butylester groups, as regards 
the carboxylic groups. Similar procedures are available 
for the protection of other reactive groups. for 
example, an -SH group (e.g., in cysteine) can be 
protected by an acetamidomethyl or paramethoxybenzyl 
group. 

In the case of a progressive synthesis, amino 
acid by amino acid, the synthesis starts preferably with 
the condensation of the C-terminal amino acid with the 
amino acid which corresponds to the neighboring amino- 
acyl group in the desired sequence and so on, step by 
step, up to the N-terminal amino acid. Another preferred 
technique which can be used is that described by R.D. 
Merrifield in "Solid Phase Peptide Synthesis" [J. Am. 
Chem. Soc, 45, 2149-2154]. In accordance with the 
Merrifield process, the first C-terminal amino acid of 
the chain is fixed to a suitable porous polymeric resin, 
by means of its carboxylic group, the amino group of the 
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amino acid then being protected, for example by a 
t-butyloxycarbonyl group, when the first C- terminal 
amino acid is thus fixed to the resin, the protective 
group of the amine group is removed by washing the resin 
with an acid, i.e., trif luoroacetic acid, when the 
protective group of the amine group is a t-butyloxycar- 
bonyl group. Then, the carboxylic group of the second 
amino acid, which is to provide the second amino-acyl 
group of the desired peptidic sequence, is coupled to 
the deprotected amine group of the C-terminal amino acid 
fixed to the resin. Preferably, the carboxyl group of 
this second amino acid has been activated, for example 
by dicyclohexyl-carbodiimide, while its amine group has 
been protected, for example by a t-butyloxycarbonyl 
group. The first part of the desired peptide chain, 
which comprises the first two amino acids, is thus 
obtained. As previously, the amine group is then de- 
protected, and one can further proceed with the fixing 
of the next amino-acyl group and so forth until the 
whole peptide sought is obtained. The protective groups 
of the different side groups, if any, of the peptide 
chain so formed can then be removed. The peptide sought 
can then be detached from the resin, for example by 
means of hydrofluoric acid, and finally recovered in 
pure form from the acid solution according to 
conventional procedures . 

As regards the peptide sequences of smallest 
size bearing an epitope or immunogenic determinant, and 
more particularly those which are readily accessible by 
chemical synthesis, it may be required, in order to 
increase their in vivo immunogenic character, to couple 
or "conjugate- them covalently to a physiologically 
acceptable and non-toxic carrier molecule. By way of 
examples of carrier molecules or macromolecular supports 
which can be used for making the conjugates according to 
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the invention can be mentioned natural proteins, such as 
tetanic toxoid, ovalbumin, serum-albumins, hemocyanins, 
etc. Synthetic macromolecular carriers, for example 
polysines or poly (D-L-alanine) -poly (L-lysine) s , can be 
used too. Other types of macromolecular carriers that 
can be used, which generally have molecular weights 
higher than 20,000, are known from the literature. The 
conjugates can be synthesized by known processes such as 
are described by Frantz and Robertson in "Infect, and 
immunity", 33, 193-198 (1981) and by P.E. Kauffman in 
"Applied and Environmental Microbiology", October 1981 
Vol. 42, No. 4, pp. 611-614. For instance, the following 
coupling agents can be used : glutaric aldehyde, ethyl 
chloroformate, water-soluble carbodiimides such 
as(N-ethyl-N'(3-dimethylamino-propyl) carbodiimide, 
HC1), diisocyanates, bis-diazobenzidine , di- and 
trichloro-s-triazines, cyanogen bromides and 

benzaquinone, as well as the coupling agents mentioned 
in "Scand. J. Immunol.", 1978, vol. 8, pp. 7-23 
(Avrameas, Ternynck, Guesdon) . 

Any coupling process can be used for bonding 
one or several reactive groups of the peptide, on the 
one hand, and one or several reactive groups of the 
carrier, on the other hand. Again coupling is advanta- 
geously achieved between carboxyl and amine groups 
carried by the peptide and the carrier or vice-versa in 
the presence of a coupling agent of the type used in 
protein synthesis, e.g., 1-ethyl-3-(3-dimethylaminopro- 
pyl) -carbodiimide, N-hydroxybenzotriazole, etc. Coupling 
between amine groups respectively borne by the peptide 
and the carrier can also be made with glutaraldehyde, 
for instance, according to the method described by 
BOQUET, P. et al. (1982) Molec. Immunol., li, 1441-1549, 
when the carrier is hemocyanin. 

The immunogenicity of epitope-bearing-peptides 
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can also be reinforced by oligomerisation thereof, for 
example in the presence of glutaraldehyde or any other 
suitable coupling agent. In particular, the invention 
relates to the water soluble immunogenic oligomers thus 
obtained, comprising particularly from 2 to 10 monomer 
units . 

The glycoproteins, proteins and other polypep- 
tides (generally designated hereinafter as "antigens" of 
this invention) whether obtained by methods, such as are 
disclosed in the earlier patent applications referred to 
above, in a purified state from LAV MAL virus prepara- 
tions or - as concerns more particularly the peptides - 
by chemical synthesis, are useful in processes for the 
detection of the presence of anti-LAV antibodies in 
biological media, particularly biological fluids such as 
sera from man or animal, particularly with a view of 
possibly diagnosing LAS or AIDS. 

Particularly the invention relates to an in 
Yitra process of diagnosis making use of an envelope 
glycoprotein or of a polypeptide bearing an epitope of 
this glycoprotein of LAV MAL for the detection of anti- 
LAV antibodies in the serums of persons who carry them. 
Other polypeptides - particular those carrying an epito- 
pe of a core protein - can be used too. 

A preferred embodiment of the process of the 
invention comprises : 

- depositing a predetermined amount of one or several of 
said antigens in the cups of a titration microplate ; 

- introducing increasing dilutions of the biological 
fluid, to be diagnosed (e.g., blood serum, spinal fluid, 
lymphatic fluid, and cephalo-rachidian fluid), into 
these cups ; 

- incubating the microplate ; 

- washing carefully the microplate with an appropriate 
buffer : 



- adding into the cups specific labelled antibodies di- 
rected against blood immunoglobulins and 

- detecting the antigen-antibody-complex formed, which 
is then indicative of the presence of LAV antibodies in 
the biological fluid. 

Advantageously the labelling of the anti-immu- 
noglobulin antibodies is achieved by an enzyme selected 
from among those which are capable of hydrolysing a 
substrate, which substrate undergoes a modification of 
its radiation-absorption, at least within a 
predetermined band of wavelenghts. The detection of the 
substrate, preferably comparatively with respect to a 
control, then provides a measurement of the potential 
risks, or of the effective presence, of the disease. 

Thus, preferred methods of immuno-enzymatic 
and also immunof luorescent detections, in particular 
according to the ELISA technique, are provided. 
Titrations may be determinations by immunofluorescence 
or direct or indirect immuno-enzymatic determinations . 
Quantitative titrations of antibodies on the serums 
studied can be made. 

The invention also relates to the diagnostic 
kits themselves for the in vitro detection of antibodies 
against the LAV virus, which kits comprise any of the 
polypeptides identified herein and all the biological 
and chemical reagents, as well as equipment, necessary 
for peforming diagnostic assays. Preferred kits comprise 
all reagents required for carrying out ELISA assays. 
Thus preferred kits will include, in addition to any of 
said polypeptides, suitable buffers and anti-human immu- 
noglobulins, which anti-human immunoglobulins are label- 
led either by an immunof luorescent molecule or by an 
enzyme. In the last instance, preferred kits also com- 
prise a substrate hydrolysable by the enzyme and provid- 
ing a signal, particularly modified absorption of a 



radiation, at least in a determined wavelength, which 

signal is then indicative of the presence of antibody in 

the biological fluid to be assayed with said kit. 

It can of course be of advantage to use 

several proteins or polypeptides not only of LAV but 

MAL' 

also of LAV ai together with homologous proteins or po- 
lypeptides of earlier described viruses, such as LAV 

BR U 

HTLV-3, ARV 2, etc. 

The invention also relates to vaccine composi- 
tions whose active principle is to be constituted by any 
of the antigens, i.e., the hereinabove disclosed poly- 
peptides of LAV mL i particularly the purified gp110 or 
immunogenic fragments thereof, fusion polypeptides or 
oligopeptides in association with a suitable pharmaceu- 
tical^ or physiologically acceptable carrier. A first 
type of preferred active principle is the gp110 
immunogen of said immunogens . Other preferred active 
principles to be considered in that fields consist of 
the peptides containing less than 250 amino acid units, 
preferably less than 150, particularly from 5 to 150 
amino acid residues, as deducible for the complete 
genome of LAV MAL and even more preferably those peptides 
which contain one or more groups selected from Asn-X-Thr 
and Asn-X-Ser as defined above. Preferred peptides for 
use in the production of vaccinating principles are 
peptides (a) to (f) as defined above. By way of example, 
there may be mentioned that suitable dosages of the 
vaccine compositions are those which are effective to 
elicit antibodies in vivo , in the host, particularly a 
human host. Suitable doses range from 10 to 500 
micrograms of polypeptide, protein or glycoprotein per 
kg, for instance 50 to 100 micrograms per kg. 

The different peptides according to this in- 
vention can also be used themselves for the production 
of antibodies,, preferably monoclonal antibodies specific 
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for the respective different peptides. For the produc- 
tion of hybridomas secreting said monoclonal antibodies, 
conventional production and screening methods can be 
used. These monoclonal antibodies, which themselves are 
part of the invention, provide very useful tools for the 
identification and even determination of relative pro- 
portions of the different polypeptides or proteins in 
biological samples, particularly human samples contain- 
ing LAV or related viruses. 

The invention further relates to the hosts 
(procaryotic or eucaryotic cells) which are transformed 
by the above mentioned recombinants and which are capa- 
ble of expressing said DNA fragments. 

Finally the invention also concerns vectors 
for transforming eucaryotic cells of human origin, par- 
ticularly lymphocytes, the polymerase of which are 
capable of recognizing the LTRs of LAV. Particularly 
said vectors are characterized by the presence of a LAV 
LTR therein, said LTR being then active as a promoter 
enabling the efficient transcription and translation in 
a suitable host of a DNA insert coding for a determined 
protein placed under its controls. 

Needless to say, the invention extends to all 
variants of genomes and corresponding DNA fragments 
(ORFs) having substantially equivalent properties, all 
of said genomes belonging to retroviruses which can be 
considered as equivalents of LAV._ _ . it must be under- 
stood that the claims which follow are also intended to 
cover all equivalents of the products (glycoproteins, 
polypeptides, DNAs, etc.) whereby an equivalent is a 
product, e.g., a polypeptide, which may distinguish from 
a product defined in any of said claims, say through one 
or several amino acids, while still having substantially 
the same immunological or immunogenic properties . A 
similar rule of equivalency shall apply to the DNAs, it 
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being understood that the rule of equivalency will then 
be tied to the rule of equivalency pertaining to the 
polypeptides which they encode. 

It will also be understood that all the 
literature referred to hereinbefore and hereinafter and 
all patent applications and patents not specifically 
identified herein but which form counterparts of those 
specifically designated herein, must be considered as 
incorporated herein by reference. 

It should further be mentioned that the 
invention further relates to immunogenic compositions 
that contain preferably one or more of the polypeptides, 
which are specifically identified above and which have 
the amino acid sequences of LAV MAL that have been 
identified, or peptidic sequences corresponding to 
previously defined LAV proteins. In this respect, the 
invention relates more particularly to the particular 
polypeptides which have the sequences corresponding more 
specifically to the LAV BRU sequences which have been 
referred to earlier, i.e., the sequences extending 
between the following first and last amino acids, of the 
LAV BRU Proteins themselves, i.e., the polypeptides 
having sequences contained in the LAV^y OMP or I»AV BRU 
TMP or sequences extending over both, particularly those 
extending from between the following positions of the 
amino acids included in the env open reading frame of 
the LAV^y genome, 

1-530 
34-530 
and more preferably 

531-877, particularly 680-700, 
37-130 
211-289 
488-530 
490-620. 
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These different sequences can be used for any 
of the above defined purposes and in any of the compo- 
sitions which have been disclosed. 
5 Finally the invention also . relates to the 

different antibodies which can be formed specifically 
against the different peptides which have been disclosed 
herein, particularly to the monoclonal antibodies which 
recognize them specifically. The corresponding hybri- 
1Q domas which can be formed starting from spleen cells 
previously immunized with such peptides which are fused 
with appropriate myeloma cells and selected according to 
standard procedures also form part of the invention. 

Phage A clone E-H12 derived from LAV 
15 infected cells has been deposited at the CNCM under No. 
1-550 on May 9, 1986. Phage clone M-H11 derived from 
LAV MAL infected cells has been deposited at the CNCM 
under No. 1-551 on May 9, 1986. 
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